WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 4 
C12N 15/00, 9/88 



Al 



(11) International Publication Number: WO 88/ 0202 

(43) International Publication Date: 24 March 1988 (24.03.8: 



(21) International Application Number: PCT/GB87/00628 

(22) International Filing Date: 8 September 1987 (08.09.87) 

(31) Priority Application Number: 8621626 

(32) Priority Date: 8 September 1986 (08.09.86) 

(33) Priority Country: GB 

(71) Applicant (for all designated States except US): THE 

PUBLIC HEALTH LABORATORY SERVICE 
BOARD [GB/GB]; 61 Colindale Avenue, London 
NW9 5EQ (GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only) : ANSON, John [GB/ 
GB]; 3 Avon Terrace, Salisbury, Wiltshire (GB). GIL- 
BERT, Harold [GB/GB]; 16 Kells Gardens, Low Fell, 
Gateshead, Tyne and Wear (GB). ORAM, Jon [GB/ 
GB]; Candle Cottage, Wick Lane, Manningford, Bo- 
hune, Pewsey, Wiltshire (GB). MINTON, Nigel, Peter 
[GB/GB]; Thatch End, 16 Newton Tohey, Salisbury, 
Wiltshire (GB). 



(74) Agent: BECKHAM, R., W.; Ministry of Defenc 
Procument Executive, Patents 1A(4), Room 201 
Empress State Building, Lillie Road, London SU 
1TR (GB). 



(81) Designated States: AT (European patent), BE (Eun 
pean patent), CH (European patent), DE (Europea 
patent), DK, FR (European patent), GB, GB (Eun 
pean patent), IT (European patent), JP, LU (Eun 
pean patent), NL (European patent), NO, SE (Eun 
pean patent), US. 



Published 

With international search report. 
Before the expiration of the time limit for amending tl 
claims and to be republished in the event of the rece{ 
of amendments. 



(54) Title: PRODUCTION OF PHENYLALANINE AMMONIA LYASE 



BomHI 




EcoRI 



start codon 



codon 




start 
codon ^ 
Pstl 
HindllJ 



(57) Abstract 

For use in genetic engineering a gene is provided, derived from a PAL-producing strain of Rhodosporidium ton 
hides, from which non coding introns have been excised. The gene may be inserted intc plasmid vectors which may be ic 
troduced into heterologous organisms so that PAL is expressed. A method of preparing the gene is provided, and its polyrj 
ucleotide sequence is listed. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to thePCToathefix)atpagesofpampMetspublishinginternationalappli- 
cations under the PCT. 



AT 


Austria 


FR 


France 


ML 


Mali 


All 


Australia 


GA 


Gabon 


MR 


Mauritania 


BB 


Barbados 


GB 


United Kingdom 


MW 


Malawi 


BE 


Belgium 


HU 


Hungary 


NL 


Netherlands 


BG 


Bulgaria 


rr 


Italy 


NO 


Norway 


BJ 


Benin 


JP 


Japan 


RO 


Romania 


BR 


Brazil 


KP 


Democratic People's Republic 


SD 


Sudan 


CF 


Central African Republic 




ofKorea 


SE 


Sweden 


CG 


Congo 


ER 


Republic ofKorea 


SN 


Senegal 


CH 


Switzerland 


LI 


Liechtenstein 


su 


Soviet Union 


CM 


Cameroon 


LK 


Sri Lanka 


TD 


Chad 


DE 


Germany, Federal Republic of 


LU 


Luxembourg 


TG 


Togo 


DK 


Denmark 


MC 


Monaco 


US 


United States of America 


n 


Finland 


MG 


Madagascar 







WO 88/02024 



1 



PCT/GB87/00628 



PRODUCTION OF PHENYLALANINE AMMONIA LYASE 

This invention relates to genetic material which encodes the protein 
phenylalanine ammonia lyase (herein abbreviated to 'PAL') and in 
particular to such genetic material which lacks the intervening non- 
coding DNA (introns) normally found in the PAL - encoding gene in 
5 its natural state . 

Phenylalanine amnion ialyase (PAL; EC 4.3.1.5) which occurs in plants, 
yeasts , fungi, and streptomycetes catalyzes the nonoxidative deamination 
of L-phenylalanine to t rans -cinnamic acid (see Gilbert et al ., 1985). 

10 The enzyme has a potential role in the treatment and diagnosis of 
phenylketonuria (Ambrus et al ., 1978) and has industrial applications in 
the synthesis of L-phenylalanine from trans -cinnamic acid (Yamada et al ., 
1981). In plants the enzyme, involved in flavanoid biosynthesis, is 
induced by illumination while in gherkin and mustard seedlings induction 

I*; is the result of activation of a constitutive pool of inactive enzyme 
(Attridge et al . , 1974). Illumination elicits de novo synthesis of the 
enzyme in other botanical species (Schroder et al ., 1979). Gherkin, 
apple, sweet potatoe, and sunflower PAL is also regulated by a specific 
inactivating system (Tan, 1980). 

20 

In some basidiomycete yeast phenylalanine can act as sole source of 
carbon, nitrogen, and energy. As PAL catalyzes the initial reaction in 
the catabolism of the amino acid, the enzyme plays a key role in 
regulating phenylalanine metabolism. In Rhodosporidium toruloides PAL is 
2 ^ induced by the presence of L-phenylalanine or L-tyrosine (Marusich et 
al., 1981). Glucose, and ammonia in the presence of glucose, repress PAL 
synthesis (Marusich et al ., 1981), while induction of PAL activity is the 
result of de novo synthesis of the enzyme rather than activation of an 
inactive precursor or a decrease in the rate of PAL degradation (Gilbert 
^ and Tully, 1982). Glucose represses PAL synthesis but has no effect upon 
stability of the enzyme, whereas ammonia prevents uptake of phenylalanine 
and so may repress enzyme synthesis through inducer exclusion (Gilbert 
and Tully, 1982). In vitro translation data of mRNA, isolated from 
toruloides grown under different physiological conditions, showed that 

35 
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phenylalanine , ammonia and glucose regulate PAL synthesis by adjusting 
the level of functional PAL raRNA (Gilbert et al ., 1983). 

In recent years genetic engineering methods have been developed 
whereby microorganisms which are common or which can easily be grown 

5 on an industrial scale, in particular certain bacteria or yeasts, 
have their genetic material <DNA sequences) modified so that they 
produce a desired compound eg a protein. Broadly this is achieved 
by inserting into the host microorganism a plasmid consisting of 
a gene which is a polynucleotide sequence which encodesthe compound, 

10 together with other genetic material which instructs the host's 
genetic apparatus to synthesise the compound. 

The gene encoding PAL has recently been cloned as a 8-5 kb genomic Pst I 
fragment (Gilbert et aL r 1985). These studies indicated that PAL is 
15 synthesised from a monocistronic mRNA of 2.5 kb, and that the gene is 
present as a single copy in the the R. toruloides genome. The 
introduction of the cloned PAL gene into both E. coli (Gilbert et al ., 
1985) and Saccharom yces cerevisae (Tully and Gilbert, 1985) did not 
result in the production of PAL protein. 

20 Although attempts have been made along these lines to introduce the 
cloned PAL - encoding gene from R-toruloides into the microorganism 
E-coli (Gilbert et al; 1985) and into the yeast Saccharomyces 
Cerevisae (Tully and Gilbert, 1985), these heterologous hosts did 
not then produce PAL protein. 

25 

It is an object of the invention to provide genetic material which 
may be introduced into host organisms other than R-toruloides , 
which hosts will then produce PAL protein. Other objects and 
advantages of the invention will be apparent from the following 
20 description* 

According to a first aspect of the invention there is provided an 
intron-free structural gene, derived from a corresponding intron- 
containing structural gene from a eukaryotic microorganism r both 
^ genes coding for the same gene product provided that the intron- 
free gene is capable of expressing the product within a prokaryotic 
or eukaryotic microorganism. The gene product may be a chemical 
compound the production of which is desired, for example a protein. 
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According to a second, preferred aspect of the invention there is 
provided an intron-free structural gene which encodes PAL or a 
polypeptide which displays PAL activity. The gene is preferably 
derived from a PAL - producing strain of a eukaryotic organism, 
5 most preferably a strain of R toruloides . 

A portion of the genetic DNA polynucleotide sequence of R« toruloides 
is shown in Fig 3. The methods used by the inventors to determine 
this sequence are described later. The PAL encoding sequence extends 

10 from the location marked "start codon" to the location marked "stop 
codon", and the introns, six in number are marked IVS 1 to IVS 6. 
The amino acids encoded by these codons are shown, as also are 
various restriction sites. The gene of the second aspect of the 
invention therefore preferably consists of a DNA sequence identical 

15 to, related to, derived from or complementary to the sequence of 
codons from the start codon to the stop codon in Fig 3, from which 
the six introns IVS 1 to IVS 6 have been deleted, having the following 
polynucleotide sequence: 



20 



25 
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CTC ACG GCC ATG ACG GTC GAA GCG ATG GTG GGC CAC GCC GGG TCG TTC 
GAC CCC TTC CTT CAC GAC GTC ACG CGC CCT CAC CCG ACG GAG ATC GAA 
GTC GCG GGA AAC ATC CGC AAG CTC CTC GAG GGA AGC CGC TTT GOT GTC 
CAC CAT GAG GAG GAG GTC AAG GTC AAG GAC GAC GAG GGC ATT CTC CGC 

5 CAG GAC CGC TAC CCC TTG CGC ACG TCT CCT CAG TGG CTC GGC CCG CTC 
GTC AGC GAC CTC ATT CAC GCC CAC GCC GTC CTC ACC ATC GAG GCC GGC 
CAG TCG ACG ACC GAC AAC CCT CTC ATC GAC GTC GAG AAC AAG ACT TCG 
CAC CAC GGC GGC AAT TTC CAG GCT GCC GCT GTG GCC AAC ACC ATG GAG 
AAG ACT CGC CTC GGG CTC GCC CAG ATC GGC AAG CTC AAC TTC ACG CAG 

10 CTC ACC GAG ATG CTC AAC GCC GGC ATG AAC CGC GGC CTC CCC TCC TGC 
CTC GCG GCC GAA GAC CCC TCG CTC TCC TAC CAC TGC AAG GGC CTC GAC 
ATC GCC GCT GCG GCG TAC ACC TCG GAG TTG GGA CAC CTC GCC AAC CCT 
GTG ACG ACG CAT GTC CAG CCG GCT GAG ATG GCG AAC CAG GCG GTC AAC 
TCG CTT GCG CTC ATC TCG GCT CGT CGC ACG ACC GAG TCC AAC GAC GTC 

15 CTT TCT CTC CTC CTC GCC ACC CAC CTC- TAC TGC GTT CTC CAA GCC ATC 
GAC TTG CGC GCG ATC GAG TTC GAG TTC AAG AAG CAG TTC GGC CCA GCC 
ATC GTC TCG CTC ATC GAC CAG CAC TTT GGC TCC GCC ATG ACC GGC TCG 
AAC CTG CGC GAC GAG CTC GTC GAG AAG GTG AAC AAG ACG CTC GCC AAG 
CGC CTC GAG CAG ACC AAC TCG TAC GAC CTC GTC CCG CGC TGG CAC GAC 

20 GCC TTC TCC TTC GCC GCC GGC ACC GTC GTC GAG GTC CTC TCG TCG ACG 
TCG CTC TCG CTC GCC GCC GTC AAC GCC TGG AAG GTC GCC GCC GCC GAG 
TCG GCC ATC TCG CTC ACC CGC CAA GTC CGC GAG ACC TTC TGG TCC GCC 
GCG TCG ACC TCG TCG CCC GCG CTC TCG TAC CTC TCG CCG CGC ACT CAG 
ATC CTC TAC GCC TTC GTC CGC GAG GAG CTT GGC GTC AAG GCC CGC CGC 

25 GGA GAC GTC TTC CTC GGC AAG . CAA GAG GTG ACG ATC GGC TCG AAC GTC 
TCC AAG ATC TAC GAG GCC ATC AAG TCG GGC AGG ATC AAC AAC GTC CTC 
CTC AAG ATG CTC GCT TAG . 



30 
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It is well known in the field of genetics that DNA sequences which 
are related to or derived 5com a defined sequence may encode the 
same protein or a polypeptide having similar activity to that expressed 
by the defined sequence. For example the related or derived sequence 
may lack some bases or may include some additional bases. Also it 

5 is known that the genetic code is degenerate, in that several codons 
may encode the same amino acid. The related or derived sequence 
may therefore contain some codons which are different to those listed 
in Fig 3 but which preferably encode the same amino acid. Genes which are 
related to or derived from this sequence of codons in one or more 

10 of these ways are included in the invention. 

Genes related to or derived from this sequence may also be defined 
in terms of the degree of conformity to this sequence. This is 
preferably as high as possible, ideally 100% , but 70% or higher, 
15 eg 85% or higher conformity to that sequence is generally satisfactory. 

To enable a gene according to the first or second aspects of the 
invention to be introduced into a host organism, it is common 
to include the gene into a recombinant DNA molecule. According to 
20 a third aspect of the invention there is therefore provided a 

recombinant DNA molecule, especially a plasmid, which contains a 
gene according to the first or second aspects of the invention. 

The plasmid according to this aspect of the invention may be used 
25 as a vector to introduce the gene into a host and may therefore also 
contain additional genetic material appropriate to a host into which 
it is intended to introduce the plasmid. Such genetic material may 
preferably contain an expression control sequence operatively linked 
to said gene, and/or transcription/translation signals from other 
30 genes appropriate to the organism into which the plasmid is to be 
introduced and from which expression of the product, eg PAL, is 
hoped • 

The structure of the plasmid according to this aspect of the invention 
35 will vary according to the host organism for which it is to be used 

as a vector, but by positioning the gene of the first or second aspect 
of the invention downstream of the appropriate regulatory signals, 
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vectors may be prepared using which expression of R. toruloides PAL 
may be obtained in any of the currently used production organisms. 
These include E. coli K12, Bacillus subtilis , Saccharomyces cerevis 
ae f Pseudomonas putida , Ervinia chrysanthemi and mammalian cell lines. 

5 Similarly the nature of the regulating DNA sequences immediately 

upstream of the PAL cDNA coding region in the plasmid will be composed 
of appropriate, characterised transcription/ translation signals* 
For example for use in S . cerevisae a ribosome binding site (conforming 
to the sequence CCACCTT) may be positioned at the appropriate position 

10 upstream of the translational start of the PAL gene, and powerful 

transcriptional signals, such as those derived from the S. cerevisae 
phosphoglycerate kinase and mating facter genes, placed 5 f to the 
ribosome binding site. The plasmid itself may use standard replicons 
(eg 2p) and selectable markers (e.g. Leu 2, Trp etc). Similarly, 

15 for use in E. coli use will be made of the PL, tac trp , rac or lac 
promoters, with appropriate bacterial ribosome binding sites, and 
plasmids based on ColEl (e.g. pBR322 and pUC plasmids), RSF1010, 
and runaway replicons of RI. As the introns present in the natural 
PAL gene act as a barrier to the expression of PAL in organisms other 

20 than R« toruloides , the invention may be used to produce PAL in 
a wide range of procaryotic and eufaxyotic hosts which are unable 
to express the natural PAL gene due to the presence of the 6 introns. 

In accordance with a fourth aspect of the invention there is provided 
25 a host organism, especially a strain of E. coli , Erwinia sp., 

Clostridia sp., Streptomyces sp. , B. subtilis , B. s tear o the rmophilus , 
Pseudomonas , other microorganisms such as bacilli, yeasts, other 
fungi, animal or plant hosts, and preferably a prokaryotic host, 
transformed with at least one recombinant DNA molecule according 
30 to the third aspect. 

The invention also provides a process for the preparation of a gene 
from which introns have been deleted which includes the steps of: 
(i) isolating PAL mRNA from a strain of R. toruloides , 
35 mi synthesising two intron-free complementary DNA ( r cDNA* ) 

sequences from the mRNA, the two cDNA sequences each - 
containing a portion of a gene which encodes PAL or a polypeptide 
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which displays PAL activity, the two portions together containing 
the 3' and the 5' ends of the gene. 

(iii) joining the two cDNA sequences proposed in (ii) to form 
an intron-free structural gene which encodes PAL or a 
5 polypeptide which displans PAL activity. 

The method used in step (ii) may use a cloning method which forms 
the cDNA sequences contained in plasmids. In such a case the 
sequences may be isolated from the plasmids which contain them by 

10 cleavage of the plasmids at a suitable restriction site, followed 
by ligation of the two sequences to form the gene. The gene may 
then be combined with other genetic material to form a plasmid 
containing it for example following cleavage of a suitable known 
plasmid such as pUC9 at appropriate sites. If desired the gene may 

15 then in turn be excised from this plasmid and combined with yet other 
genetic material to form other plasmids which may be used as vectors. 
Standard recombinant DNA techniques, familiar to those skilled in 
the art may be used for the process of the invention. . 

20 The gene and/or plasmid produced in step (iii) of this process is 
preferably one of the genes or plasmids encompassed by the second 
and/or third aspects of the invention, and the cDNA sequences 
produced in step (ii) are consequently preferably portions of these. 
The cDNA sequences produced in step (ii) and plasmids containing 

25 them are further aspects of the invention. 

The invention therefore also includes DNA polynucleotide sequences, 
eg plasmids, the same as or substantially the same as or derived 
from or related to those produced by the process of the invention. 

30 The invention will now be described by way of examole only with 
reference to the accompanying figures: 

Fig 1 is a schematic diagram illustrating how the genetic 
DNA carrying the PAL gene was sequenced. 

35 Fig 2 illustrates the production of the two plasmids 

carrying the PAL gene which lack the intron sequences 
of the natural gene. 
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Fig 3 shows the complete nucleotide sequence of the genomic 
clone, the intron sequences removed in the invention 
being labelled IVS1 to IVS6, and the corresponding amino 
acid sequence of PAL. 

5 Fig 4 shows the formation of the recombinant plasmid pPAL 3 

containing the intron-free gene, by combination of the 
two cDNA plasmids pPALl and pPAL2 . 

Figs 5 illustrates the DNA nucleotide sequences of the over- 
10 & 6 lapping cDNA clones pPALl and pPAL 2 respectively. 

Fig 7 shows the expression of PAL protein from the plasmid 
pPAL4. 

In this description and the figures the following abbreviations are 
15 used: 





Amino acid 


symbol 


Nucleotide bases 


symbol 




Alanine 


Ala 


Uracil 


U 




Arginine 


Arg 


Thymine 


T 


20 


Asparagine 


Asn 


Cytosine 


C 




Aspartic acid 


Asp 


Adenine 


A 




Asn + Asp 


Asx 


Guanine 


G 




Cysteine 


Cys 








Glutamine 


Gin 






25 


Glutamic Acid 


Glu 








Gin + Glu 


Glx 








Glycine 


Gly 








Histidine 


His 








Xsoleucine 


He 






30 


Leucine 


Leu 








Lysine 


Lys 








Methionine 


Met 








Phenylalanine 


Phe 








Proline 


Pro 






35 


Serine 


Ser 








Threonine 


Thr 








Tryptophan 


Trp 
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Amino acid 



Symbol 



Tyrosine 



Tyr 



Valine 



Val 



Referring to Figs 1 to 6 in more detail: 

5 In Fig 1- Region 2 was isolated from the appropriate clone (pHG3), 

circularised by treatment with T4 DNA ligase, fragmented by sonication, 
and fragments of between 500 and 1000 bp inserted into M13mp8. 
Region (1) was inserted into Ml3mp8 and Ml3mp9 as various specific 
fragments utilising the restriction sites BamH I, Bel l and Sai l, 

10 The sequence of the DNA spanning the BamH I site was obtained by 
cloning the indicated fragment (3) into M13mp8* 

In Fig 2. Clone 1 (pPALl) was obtained by the method of Heidecker 
and Messing (1983). Total raRNA from pal-induced R. toruloides cells 

15 was annealed to oligo(dT) -tailed pUC9, and the first strand cDNA 

copy synthesised using reverse transcriptase in the presence of all 
four dNTP's. The newly synthesised strands were tailed with oligo(dC) 
using terminal deoxynucleotidyl tranf erase. Following fractionation 
by an alkaline sucrose gradient r single-stranded plasmid DNA carrying 

20 cDNA sequences were annealed to denatured oligo(dG ) -tailed pUC9 and 
the second strand synthesised using DNA polymerase (Klenow) and the 
addition of all four dNTP • s . Clone 2 (pPAL2) was constructed using 
the procedure of Gubler and Hoffman (1982). The first strand cDNA 
copy was synthesised using reverse transcriptase and a 19-mer 

25 oligodeoxynucleotide primer ( GATC AG AGGGTTGTCGGTC ) complementary to 
pal mRNA. The RNa within the RNA-DNa hybrid was then nicked with 
RNase H and the RNA strand replaced with DNA by E« coli DNA polymerase, 
utilsing the nicked RNA as a primer. The double stranded DMA was 
then blunt ended by the action of T4 DNA polymerase/ tailed with 

30 oligo(dC), and annealed to oligo(dG) tailed pBR322. cDNA clones 

produced using both methods were transformed into E. coli JM83, and 
colonies screened for Pal cDNa sequences using [ 0^- 32 p] dATP-labelled 
pHG3 restriction fragments. 
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In Fig 3. The determined amino acid sequences of the 5 randomly 
derived peptide fragments are indicated by overlining of the relevant 
residues. The introns are labelled IVS 1-6 r and the sequence common 
to all 6 is indicated by underlining of the relevant region, A 
dashed overline in the 5' non-coding region represents the TC rich 
5 region of the sequence, while the under- and overlining immediately 
downstream marks a repetitive region. The sequence extends from 
the most leftward Bel l site of Figure 1, to the 3* end of cDNA clone 
PPAL1 (Figure 2). 

In Fig 4- The single line of the pPALl and pPAL3 circular maps 
represent pUC9 and pUC8 derived DNA, respectively, while the single 
line of pPAL2 represents pBR322 DNA (see Fig. 2 for construction 
of pPALl and pPAL2). The double line of pPAL2 represents the 
"intron-free" 5' end of the PAL gene, while the thick line of 

1 5 pPALl represents the 3 T end of the gene. The 5' end of the PAL 
gene was isolated as a l.Okb Pst l - Fspl fragment from pPAL2 and 
ligated to a 1.25kb Fsp l - BamHI fragment, isolated from pPALl, which 
carried the 3 'end of the gene. The ligated fragment was inserted 
between the BamH I and Pst l sites of pDC8 to yield pPAL3. The positions 

20 of tn® PAL gene trans lational start (ATG) and stop (TAG) codon (see 
Fig. 3) are marked by arrows. The orientation of insertion of the 
PAL gene is such that transcriptional read through from the vector 
borne lac promoter ( lac po) will not occur. 

in Fig 5 & 6. The Fsp 1 site -used to join these two clones to form 
pPAL 3 is indicated by underlining of the relevant nucleotides. 

In Fig 7* The plasmid pPAL4 contains the complete PAL gene from 
pPAL 3 (Fig 4) cloned into pUCS as an EcoRI - Hind III fragment such 
that transcriptional read through from the adjacent lac promoter 
30 can occur. Gene product formation was assessed using a plasmid- 

directed in vitro translation kit obtained from Amersham International 
PLC. Samples in the numbered tracks are as follows; l f no DNA added; 
2, plasmid ptJC9; 3, plasmid pPAL4. Molecular weights of the protein 
markers are given as My. 
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In the following description reference will be made to the following 
general procedures: 

MICROBIAL STRAINS AND PLASMIDS 

5 

Microbial strains and plasmids used in accordance with the invention are 
listed in Table 1. 

MEDIA 

E - coli strains were cultured in L-broth (1% tryptone, 0.5% yeast 
extract, 0.5% NaCl). Media were solidified with the addition of 2% (w/v) 
Bacto-agar (Difco). Ampicillin (100 jjg ml" 1 was used for the selection 
and growth of trans forraants „ Functional ft -galactosidase was detected by 
the addition of 5~brom074-chloro-indoyl- ft -D-galactoside (X-Gal) to a 
final concentration of 2 yq ml" 1 

CHEMICALS 
32 

I CL 9} dATP and the cDNA synthesis kit were obtained from Araershara 
International. Agarose, restriction enzymes, T4 DNA ligase, terminal 
deoxynucleotidyl transferase and 17raer universal sequence primer were 
purchased from Bethesda Research Laboratories. Klenow DNA polymerase was 
from Boehringer Mannheim, while dT tailed pUC9 was from PL-Biocheraicals . 
Reverse transcriptase was purchased from Anglicon Biotechnology Ltd. 
while all other reagents were obtained from Sigma Chemical Co. or BDH. 
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Strain 

E. coli JM83 

5 E. coli JM101 

Plasmids 

pUC9 
10 pBR322 

pGH3 

pPALl 

pPAL2 
15 pPAL3 

pPAL4 

Bacter iophage 
M13mp8 

20 

M13mp9 

25 
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TABLE 1 Microbial Strains and Vectors Used 



ara , ( lac-pro ) 

rpsL, thi r O80d lad ZM15 

(lac-pro) , supE r £hi/ 
FlacI ZM15 traD pro 



Amp 

_ R m R 

Amp Tet 

Amp ('PAL genomic clone) 

Amp (3* end PAL cDNA clone) 

Amp (5' end PAL cDNA clone) 

Amp (entire PAL cDNA gene) 

Amp (entire PAL cDNA gene) 



Vieria and Messing (1982) 
Messing and Vieria (1982) 

Vieria and Messing (1982) 
Bolivar et al . (1977) 
Gilbert et al . (1985) 
Novel plasmids 
Novel plasmids 
Novel plasmids 
Novel plasmids 



Messing and Vieria (1982) 
Messing and Vieria (1982) 
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All restriction enzymes and DNA/RNA modifying enzymes were used in the 
buffers and under the conditions recommended by the suppliers. Plasmid 
5 transformation techniques and all manipulation of DNA have previously 
been described (Minton et al . , 1984). 

PLASMID DNA ISOLATION 

10 E. coli , plasmids were purified from 1 litre of L-broth cultures 
containing ampicillin by "Brij lysis'* and subsequent CsCl density 
gradient centr if ugat ion (Clewell and Helinski, 1969). The rapid boiling 
method of Holmes and Quigley (1981) was employed for small scale plasmid 
isolation screening purposes. 

15 

TEMPLATE GENERATION BY SONICATION 

The DNA to be sequenced was fragmented into random blunt-ended fragments 
by the procedure of Deininger (1983). The fragments obtained were cloned 
2q into the Sin a I site of M13mp8 and template DNA prepared as described by 
Sanger et al (1980). 

NUCLEOTIDE SEQUENCING 

2^ Nucleotide sequencing was undertaking by the dideoxy method of Sanger et 
al (1980). The data obtained was compiled into a complete sequence using 
the computer programmes of Staden ( I98O ). 

ISOLATION OF PAL mRNA 

30 

PAL mRNA was isolated as has previously been described (Gilbert et al ., 
1985) employing publicly available R. toruloides strain IFO 0559 
(equivalent to NCYC 1589 deposited at the National Collection of Yeast 
Cultures, Norwich (GB) under the terms of the Budapest Treaty on 8 
September 1986) . 
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CDNA CLONING (SYNTHESIS) 

i) He idecker -Messing Method 

The method utilised was essentially as described by He idecker and Messing 
( 1983). Total mRNA from PAL induced R. toruloides cells was annealed to 
oligo dT tailed pUC9 and the first strand cDNA copy synthesised using 
reverse transcriptase in the presence of all 4 dNTP's. The newly 
synthesised strands were tailed with oligo dC using terminal 
20 deoxynucleptidyl transferase* Following fractionation by an alkaline 
sucrose gradient, single stranded plasmid DNA carrying cDNA sequences 
were annealed to denatured oligo dG tailed pUC9 and the second strand 
synthesised using Klenow DNA polymerase and the addition of all 4 dNTP's. 



15 



20 



25 
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ii) Gubler-Hof fman Method 

The second method employed in the synthesis of cDNA was that of Gubler 
and Hoffman (1983). The first strand cDNA copy was synthesised using 
reverse transcriptase and a 19-mer primer ( GATGAG AGGGTTGTCGGTC ) 
complementary to PAL mRNA. The RNA within the RNA-DNA hybrid was then 
"nicked" with RNaseH and the RNA strand replaced with DNA by E. coli DNA 
polymerase, utilising the nicked RNA as a primer. The double stranded 
DNA was then blunt -ended by the action of T4 DNA polymerase , tailed with 
oligo dC and annealed to oligo-dG tailed pBR322. 

DETECTION OF PAL cDNA CLONES 

Plasmid DNA carrying cDNA inserts were transformed into E. coli JM83 and 
the Amp transformants screened for PAL specific DNA* This was 
undertaken by in situ colony hyridisation (Grunstein and Hogness, 1975), 
utilising radio labelled pHG3 DNA subfragments carrying portsion of the 
PAL gene. 

AMINO ACID SEQUENCING 



Peptide, fragments of purified (according to Gilbert et al ., (1985) PAL 
protein were isolated as previously described (Minton et al . , 1984), 
Amino acid sequencing was undertaken using automated Edman degradation 
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using an Applied Biosystems gas phase sequencer, model 470A. 
IN VITRO TRANSLATION 

5 The bacterial ion-free coupled transcription-translation system used was 
a modification of that first described by De Vries and Zubay (1967). The 
E. col i S-30 extract and the supplement solutions required for in vitro 
expression of genes contained on a bacterial pi asm id were purchased as a 
kit from Amersham International PLC. Proteins produced were labelled 

10 with ^S-methionine (Amersham) , and analysed by SDS-PAGE on 12% 
acrylamide gels (Laemmli, 1970), Gels were dried prior to 

autoradiography for 16 hours. 
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1 . NUCLEOTIDE SEQUENCING OF THE PAL GENOMIC CLONE 

The PAL gene was previously shown (Gilbert et al ., 1985) to occupy a 2.5 
kb region of DNA within a 6.7 kb Bel l fragment cloned into pUC8 to yield 

5 the recombinant plasraid pHG3 (see Fig. 1). The majority of the gene 
resided on a 3 kb BamH I fragment, while the remaining 5* end of the gene 
lay on a 0.7 kb BamHI-BclI fragment. Accordingly, the 3 kb fragment was 
isolate from an appropriate clone . (fragment 2, Fig. 1) and random 
subfragments, generated by sonication (Deininger, 1983), cloned into 

10 M13mp8. A total of some 250 templates were prepared and sequenced, the 
data obtained being compiled into a complete sequence using the computer 
programmes of Staden (1980). The sequence of the 5* end of the gene was 
obtained by the site directed cloning of the relevant Bell -BamHI, 
Bcll-Sall and BamHI-Sall fragments (region 1, Fig* 1.) into the 

1^ appropriate sites of M13mp8 and M13mp9. Sequence determination of the 
DNA spanning the BamHI site was achieved by cloning the Sall-Xhol 
fragment (3) indicated in Fig. 1. 

The translation of the appropriate DNA strand of the sequenced region 
2Q indicated that an open reading frame (ORF) capable of coding for PAL was 
not present. Confirmation that this region does encode PAL, however, was 
obtained by comparing the translated amino acid sequences with the 
determined sequence of 5 randomly derived peptide fragments. All 5 
peptide sequences were located within the translated sequence but 
2 ^ occurred in various translational reading frames (Fig. 2.). The absence 
of a contiguous ORF suggested that, in common with other fungal genes, 
the PAL gene contains introns. 

2 . ISOLATION OF cDNA CLONES CARRYING THE PAL GENE 

30 

To enable the identification of the PAL intervening sequences we elected 
to redone the gene from cDNA. In the initial experiments the procedure 
of Heidecker-Messing (1983) was adopted, utilising the vector pUC9 and 
purified PAL mRNA. Clones carrying PAL DNA sequences were identified 
^ utilising the 3 kb BamHI-BclI fragment of pHG3 as a DNA probe. The 
largest clone obtained, pPALl, contained some 1.3 kb from the 3' end of 
the PAL gene (Fig. 3). The 5' end of the gene was obtained by cloning 
C-tailed cDNA, prepared by the method Gubler and Hoffman (1983), into 
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G-tailed pBR322, to yield pPAL2 . In this case the primer utilised during 
first strand synthesis was a synthesised 19-mer oligonucleotide 
complementary to the PAL coding strand 150 bp downstream from the 5' end 
of the previously obtained cDNA (see Fig. 3.). The nucleotide sequence 
of the two cDNA clones was determined by site directed cloning of 
appropriate restriction fragments into M13mp8 and M13mp9. 



3. I NDKNTI FICATION OF THE PAL INTRONS 



XO Sequence determination of the 2 clones confirmed the presence of 6 
introns within the PAL coding sequence. Thus the 6 regions of DNA 
labelled IVS1 to IVS6 were completely absent from the appropriate regions 
of pPALl and pPAL2. Examination of the 6 missing regions revealed that 
they all contained the nucleotides CAG at their 3' ends, exhibiting 

1^ perfect agreement to the consensus intron accepter sequence generally 
observed in eukafyotic genes (Mount, 1982). A number of the sequences at 
the 5* end of some of these introns demonstrated less conformity to the 
eukaryotic consensus donor sequence ( GT A/GAGT ) . Thus the donor sequences 
of IVS 2, 4 and' 5 were GTGCGT, GTGCGC and GTGCGC respectively. The 

2Q introns of eukaryotic genes have been generally shown to contain 
sequences necessary for the accurate splicing of the intervening 
non-coding regions. Seqeunces conforming to consensus sequences observed 
in the introns of other eukaryoti cs (e.g. TACTTAACA in S. cerevisae ; see 
Orbach et al ., 1986) are not present in the R. toruloides introns. In 

2^ their place a sequence is present conforming to the consensus G/ANG/CTGAC 
(the relevant sequence within each intron has been underlined in Fig. 3). 
Such a sequence may be specific to R. toruloides and closely related 
organisms . 

The PAL gene has been shown not to express in either E. coli (Gilbert et 

30 

al . , 1985 ) or S. cerevisae (Tully and Gilbert, 1985). The reason for 
lack of expression in the former is undoubtedly due to the presence of 
introns in the PAL gene. Furthermore, although S. cerevisae is capable 
of splicing introns, the differences in the nucleotide sequences of the 
PAL introns and those found in S. cerevisae intron probably explains the 

35 

inability of this yeast to express the R. toruloides PAL gene. 



WO 88/02024 



PCT/GB87/00628 



18 

4, DERIVATION OP A CONTIGUOUS cDHA PAL GENE 

The procedure utilised in the cloning of the PAL gene from cDNA had 
resulted in two clones, pPALl, which carried the 3* end of the gene, and 

5 pPAL2, carrying the 5' end of the gene, A third plasmid was constructed, 
carrying the entire PAL structural gene by amalgamating the inserts of 
the above two plasmids. This was achieved by isolating a l.Okb Fsp l - 
PstI fragment from pPAL2, carrying the 5' end of PAL, and ligating it to 
a 1.25kb FstjI - BamHI fragment carrying the 3' end of the gene isolated 

10 from pPALl. The ligated DNA was then inserted into pUC9 cleaved with 
PstI and BamHI (Fig. 4). The plasmid pPAL3 therefore carries the entire 
PAL structural gene, but lacks all S introns found in the natural 
toruloides chromosomal gene. 

15 5. SYNTHESIS OF PAL PROTEIH 

The fragment containing the complete^ intron-free PAL gene from pPAL3 has 
been cloned into the pUC plasmids in both orientations relative to the 
lac promoter, to give pPAL3 (pUC8) and pPAL4 (pUC9). In pPAL4, the PAL 

20 gene is in phase with the lacZ promoter from pUC9, and synthesis of PAL 
protein has been demonstrated in a plasmid-directed in vitro translation 
system. This is shown in Fig. 7. With the PAL gene in the opposite 
orientation (pPAL3) no PAL protein is produced. We are currently 
developing vector systems to enable us to express the PAL gene in 

25 Saccharomyces cerevisiae . 
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CLAIMS 

1. A gene, characterised in that it is an intron-free structural 
gene, derived from a corresponding intron-containing structural gene 
from a eukaryotic microorganism, both genes encoding the same product 
provided that the intron-free gene is capable of expressing the product 
within a prokaryotic or eukaryotic microorganism. 

2. A gene as claimed in claim 1, characterised in that the gene 
encodes phenylalanine ammonia lyase ( 1 PAL 1 ) or a polypeptide which 
displays PAL activity, 

3. A gene as claimed in claim 2, characterised in that it is derived 
from an intron-containing gene of a PAL-producing strain of a eukaryotic 
microorganism. 

4. A gene as claimed in claim 3, characterised in that the micro- 
organism is R toruloides . 

5. A gene characterised in that it has a structure identical to, 
related to, derived from or complementary to the following 



polynucleotide sequence: 



ATG 


GCG 


CCT 


CCA 


CCA 


ACC 


TCG 


CAG 


TCG 


CAG GCT 


CGC 


ACC 


TGC CCC ACA 
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CAG 
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ACG 


CAG 


GTC 


GAC 


ATC 


GTC 


GAG AAG 


ATG 


CTC 


GCC GCG GCG 


ACC 


GAC 


TCG 


ACG 


CTC 


GAA 


CTC 


GAC 


GGC 


TAC TCG 
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GTC 
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TAC 
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ACG ACT 
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TTT 


GGC GGA TCC 
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GAC 


ACC 


CGC 


ACC 


GAG 


GAC 
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TCG CTC 


CAG 


AAG 


GCT CTC CTC 


GAG 
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CAG 
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GGT 
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CTT 
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GAG 


GTT 
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5- (contd, ) 

CTC ACG GCC ATG ACG GTG GAA GGG ATG GTC GGG CAC GCC GGG TCG TTC 
CAC CCC TTC CTT CAC GAC GTC ACG CGC CCT CAC CCG ACG GAG ATC GAA 
GTC GCG GGA AAC ATC CGC AAG CTC CTC GAG GGA AGC CGC TTT GCT GTC 
CAC CAT GAG GAG GAG GTC AAG GTC AAG GAC GAC GAG GGC ATT CTC CGC 
CAG GAC CGC TAC CCC TTG CGC ACG TCT CCT CAG TGG CTC GGC CCG CTC 
GTC AGC GAC CTC ATT CAC GCC CAC GCC GTC CTC ACC ATC GAG GCC GGC 
CAG TCG ACG ACC GAC AAC CCT CTC ATC GAC GTC GAG AAC AAG ACT TCG 
CAC CAC GGC GGC AAT TTC CAG GCT GCC GCT GTG GCC AAC ACC ATG GAG 
AAG ACT CGC CTC GGG CTC GCC CAG ATC GGC AAG CTC AAC TTC ACG CAG 
CTC ACC GAG ATG CTC AAC GCC GGC ATG AAC CGC GGC CTC CCC TCC TGC 
CTC GCG GCC GAA GAC CCC TCG CTC TCC TAC CAC TGC AAG GGC CTC GAC 
ATC GCC GCT GCG GCG TAC AGC TCG GAG TTG GGA CAC CTC GCC AAC CCT 
GTG ACG ACG CAT GTC CAG CCG GCT GAG ATG GCG AAC CAG GCG GTC AAC 
TCG CTT GCG CTC ATC TCG GCT CGT CGC ACG ACC GAG TCC AAC GAC GTC 
CTT TCT CTC CTC CTC GCC ACC CAC CTC TAC TGC GTT CTC CAA GCC ATC 
GAC TTG CGC GCG ATC GAG TTC GAG TTC AAG AAG CAG TTC GGC CCA GCC 
ATC GTC TCG CTC ATC GAC CAG CAC TTT GGC TCC GGC ATG ACC GGC TCG 
AAC CTG CGC GAC GAG CTC GTC GAG AAG GTG AAC AAG ACG CTC GCC AAG 
CGC CTC GAG CAG ACC AAC TCG TAC GAC CTC GTC CCG CGC TGG CAC GAC 
GCC TTC TCC TTC GCC GCC GGC ACC GTC GTC GAG GTC CTC TCG TCG ACG 
TCG CTC TCG CTC GCC GCC GTC AAC GCC TGG AAG GTC GCC GCC GCC GAG 
TCG GCC ATC TCG CTC ACC CGC CAA GTC CGC GAG ACC TTC TGG TCC GCC 
GCG TCG ACC TCG TCG CCC GCG CTC TCG TAC CTC TCG CCG CGC ACT CAG 
ATC CTC TAC GCC TTC GTC CGC GAG GAG CTT GGC GTC AAG GCC CGC CGC 
GGA GAC GTC TTC CTC GGC AAG CAA GAG GTG ACG ATC GGC TCG AAC GTC 
TCC AAG ATC TAC GAG GCC ATC AAG TCG GGC AGG ATC AAC AAC GTC CTC 
CTC AAG ATG CTC GCT TAG 

which encodes PAL or a polypeptide which displays PAL activity. 
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6. A gene as claimed in claim 5 characterised in that it^lacks some 
bases or includes some additional bases or has some of the listed 
codons replaced by other codons, provided that the gene encodes PAL 
or a polypeptide displaying PAL activity. 

7. A recombinant DNA molecule characterised in that it contains 
a gene as claimed in any one of claims 1 to 4* 

8. A recombinant DNA molecule characterised in that it contains 
a gene as claimed in claim 5 or claim 6, 

9. A molecule as claimed in claim 8 characterised in that it is 
a plasmid. 

10- A plasmid as claimed in claim 9 characterised in that it is a 
vector and also contains an expression control sequence operatively 
linked to the gene, and/or transcription/translation signals appropriate 
of PAL, or a polypeptide which displays PAL activity, from E. coli K12, 
Bacillus subtilis , Saccharomyces cerevisae , Preudomonas putida , 
Erwinia chrysanthemi or mammalian cell lines* 

11. A molecule as claimed in claim 10 characterised in that is contains 
a ribosome binding site upstream of the translational start of the 

gene and transcriptional signals derived from the S cerevisae 
phosphoglycerate kinase and mating facter genes placed 5 1 to the 
ribosome binding site. 

12. A recombinant DNA molecule characterised in that it consists 
of a gene as claimed in claim 5 inserted into the plasmid pUC9 with 
the gene in phase with the lac Z promoter of pUC 9. 

13. A host microorganism characterised in that it is transformed 
with a recombinant DNA molecule as claimed in claim 8. 

l*f. A process for the preparation of a gene from which introns have 
been deleted characterised in that it includes the steps of: 

(i) isolating PAL mRNA from a strain of R. toruloides . 

(ii) synthesising two intron-free cDNA sequences from the mRNA, 
the two cDNA sequences each containing a portion of a gene which 
encodes pAL or a polypeptide which displays PAL activity, the 

two portions together containing the 3' and the 5* ends of the gene. 
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(lii) joining the two cDMA sequences to form an intron-free 
structural gene which encodes PAL or a polypeptide which displays 
PAL activity. 

15. A polynucleotide sequence characterised in that it is a 
portion of an intron-free gene which encodes PAL or a polypeptide 
which displays PAL activity and contains the 3' or the 5' end of 
the gene. 

16. A sequence as claimed in claim 15 characterised in that 
it contains a polynucleotide sequence identical to, related to 
or derived from the following polynucleotide sequence: 

ATG GCG CCT GGA CCA ACC TCG CAG TCG CAG GCT CGC AGC TGC CCC ACA 
ACC CAG GTC ACG CAG GTC GAC ATC CTC GAG AAG ATG CTC GCC GCG CCG 
ACC GAC TCG ACG CTC GAA CTC GAC GGC TAC TGG CTC AAC CTC GGA GAC 
GTC GTC TCG GCC GCG AGG AAG GGC AGG CCT GTC CGC GTC AAG GAC AGC 
GAC. GAG ATC CGC TCA AAG ATT GAC AAA TCG GTC GAG TTC TTG CGC TCG 
CAA CTC TCC ATG AGC GTC TAC GGC GTC ACG ACT GGA TTT GGC GGA TCC 
GCA GAC ACC CGC ACC GAG GAC GCC ATC TCG CTC CAG AAG GCT CTC CTC 
GAG CAC CAG CTC TGC GCT GTT CTC CCT TCG TCG TTC GAC TCG TTC CGC 
CTC GGC CGC GGT CTC GAG AAC TCG CTT CCC CTC GAG GTT GTT CGC GGC 
GCC ATG ACA ATC CGC GTC AAC AGC TTG ACC CGC GGC CAC TCG GCT GTC 
CGC CTC GTC GTC CTC GAG GCG CTC ACC AAC TTC CTC AAC CAC GGC ATC 
ACC CCC ATC GTC CCC CTC CGC GGC ACC ATC TCT GCG TCG GGC GAC CTC 
TCT CCT CTC TCC TAC ATT GCA GCG GCC ATC AGC GGT CAC CCG GAC AGC 
AAG GTG CAC GTC GTC CAC GAG GGC AAG GAG AAG ATC CTG TAC GCC CGC 
GAG GCG ATG GCG CTC TTC AAC CTC GAG CCC GTC GTC CTC GGC CCG AAG 
GAA GGT CTC GGT CTC GTC AAC GGC ACC GCC GTC TCA GCA TCG ATG GCC 
ACC CTC GCT CTG CAC GAC GCA CAC ATG CTC TCG CTC CTC TCG CAG TCG , 
CTC ACG GCC ATG ACG GTC GAA GCG ATG GTC GGC CAC GCC GGC TCG TTC • 
CAC CCC TTC CTT CAC GAC GTC ACG CGC CCT CAC CCG ACG CAG ATC GAA 
GTC GCG GGA AAC ATC CGC AAG CTC CTC GAG GGA AGC CGC TTT GCT GTC 
CAC CAT GAG GAG GAG GTC AAG GTC AAG GAC GAC GAG GGC ATT CTC CGC 
CAG GAC CGC TAC CCC TTG CGC ACG . 

17. A sequence as claimed in claim 15 characterised in that 
it contains a polynucleotide sequence identical to, related to 
or derived from the following polynucleotide sequence: 



WO 88/02024 



PCT/GB87/00628 



25 



17. (contd.) 

TCT CCT CAG TGG CTC GGC CCG CTC 
GTC AGC GAC CTC ATT CAC GCC CAC GGC CTC CTC ACC ATC GAG GCC GGC 
GAG TCG ACG ACC GAC AAC CCT CTC ATC GAC GTC GAG AAC AAG ACT TGG 
CAC CAC GGC GGC AAT TTC CAG GCT GCC GCT GTG GCC AAC ACC ATG GAG 
AAG ACT CGC CTC GGG CTC GCC CAG ATC GGC AAG CTC AAC TTC ACG CAG 
CTC ACC GAG ATG CTC AAC GCC GGC ATG AAC CGC GGC CTC CCC TCC TGC 
CTC GCG GCC GAA GAC CCC TCG CTC TCC TAC CAC TGC AAG GGC CTC GAC 
ATC GCC GCT GCG GCG TAC ACC TCG GAG TTG GCA CAC CTC GCC AAC CCT 
GTG ACG ACG CAT GTC CAG CCG GCT GAG ATG GCG AAC CAG GCG GTC AAC 
TCG CTT GCG CTC ATC TCG GCT CCT CGC ACG ACC GAG TCC AAC GAC GTC 
CTT TCT CTC CTC CTC GCC ACC CAC CTC TAC TGC GTT CTC CAA GCC ATC 
GAC TTG CGC GCG ATC GAG TTC GAG TTC AAG AAG CAG TTC GGC CCA GCC 
ATC GTC TCG CTC ATC GAC CAG CAC TTT GGC TCC GCC ATG ACC GGC TCG 
AAC CTG CGC GAC GAG CTC GTC GAG AAG GTG AAC AAG ACG CTC GCC AAG 
CGC CTC GAG CAG ACC AAC TCG TAC GAC CTC GTC CCG CGC TGG CAC GAC 
GCC TTC TCC TTC GCC GCC GGC ACC GTC GTC GAG GTC CTC TCG TCG ACG 
TCG CTC TCG CTC GCC GCC GTC AAC GCC TGG AAG GTC GCC GCC GCC GAG 
TCG GCC ATC TCG CTC ACC CGC CAA GTC CGC GAG ACC TTC TGG TCC GCC I 
GCG TCG ACC TCG TCG CCC GCG CTC TCG TAC CTC TCG CCG CGC ACT CAG 
ATC CTC TAC GCC TTC GTC CGC GAG GAG CTT GGC GTC AAG GCC CGC CGC 
GGA GAC GTC TTC CTC GGC AAG CAA GAG GTG ACG ATC GGC TCG AAC GTC 
TCC AAG ATC TAC GAG GCC ATC AAG TCG GGC AGG ATC AAC AAC GTC CTC 
CTC AAG ATG CTC GCT TAG . 

18. A polynucleotide sequence as claimed in claim 16 or 17 
characterised in that it lacks some bases or includes other bases 
or has some of the listed codons replaced by other codons. 

19. A recombinant DNA molecule characterised in that it contains 
a polynucleotide sequence as claimed in claim 15. 

20. A recombinant DKA molecule characterised in that it contains 
a polynucleotide sequence as claimed in claim 16 or 17. 

21. A recombinant DNA molecule as claimed in claim 20 characterised 
in that the polynucleotide sequence is combined with pBR322 or pUC9. 
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